Created attachment 1213451 [details] Screen shot of the directories at backend. Description of problem: Killing two bricks when rm -rf is in progress and bringing the bricks online. The directories gets deleted from the mount point but remains there at the backend bricks Version-Release number of selected component (if applicable): [root@dhcp47-143 gluster]# rpm -qa | grep gluster glusterfs-libs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 python-gluster-3.8.4-2.26.git0a405a4.el7rhgs.noarch gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-api-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.el7rhgs.noarch glusterfs-client-xlators-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-server-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-cli-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 gluster-nagios-addons-0.2.7-1.el7rhgs.x86_64 glusterfs-fuse-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-2.26.git0a405a4.el7rhgs.x86_64 How reproducible: 100% . Hit 2/2 Logs are kept at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/KaranS/nodeletion/ tried this issue with MDCACHE on and off. hitting with both the scenarios. Steps to Reproduce: 1. Create 1*(2+1) volume (gluster) 2. Create directories folder{1..10000} 3. add-brick : add 3 bricks to the volume making 2*(2+1) 4. Do fix-layout. let it complete. 5. Create directories explorer{1..10000} 6. do rm -rf explorer* while it is in progress kill the first two bricks of the subvol i.e data bricks. 7. You will start seeing ROFS and transport end-point not connected, 8. Bring back the bricks online using start force. 9. Check heal info of the volume. you will start seeing healing in progress for the directories which were having ROFS error on mount point. 10. now delete all the contents from the mount point using rm -rf *. Actual results: 1) Heal info show no entries to be healed. 2) Directories are present on the bricks while no content is there on the mount point. Expected results: No directories should be present on the back end bricks. Additional info: [root@dhcp47-143 gluster]# gluster volume info Volume Name: arbiter Type: Distributed-Replicate Volume ID: b19d2f27-b079-46f7-83f7-4d62d9efbad4 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: dhcp47-141.lab.eng.blr.redhat.com:/bricks/brick0/gluster Brick2: dhcp47-143.lab.eng.blr.redhat.com:/bricks/brick0/gluster Brick3: dhcp47-144.lab.eng.blr.redhat.com:/bricks/brick0/gluster (arbiter) Brick4: dhcp47-141.lab.eng.blr.redhat.com:/bricks/brick1/gluster Brick5: dhcp47-143.lab.eng.blr.redhat.com:/bricks/brick1/gluster Brick6: dhcp47-144.lab.eng.blr.redhat.com:/bricks/brick1/gluster (arbiter) Brick7: dhcp47-141.lab.eng.blr.redhat.com:/bricks/brick2/gluster Brick8: dhcp47-143.lab.eng.blr.redhat.com:/bricks/brick2/gluster Brick9: dhcp47-144.lab.eng.blr.redhat.com:/bricks/brick2/gluster (arbiter) Brick10: dhcp47-141.lab.eng.blr.redhat.com:/bricks/brick3/gluster Brick11: dhcp47-143.lab.eng.blr.redhat.com:/bricks/brick3/gluster Brick12: dhcp47-144.lab.eng.blr.redhat.com:/bricks/brick3/gluster (arbiter) Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on
After testing few scenarios ,post the talk, it might impact the directory structure while creating it with the same name as present in the bricks. Hence moving this back to 3.2 release. Thanks & regards Karan Sandha
Was able to figure out the issue with Karan and Pranith's help with simpler steps: 1. Create a 3 node distributed arbiter vol- 2 x (2+1) config (i.e. bricks 1 to 6) 2. mkdir explorer{1..10000} on fuse client 3. start rm -rvf explorer* While 3. is going on: 4. Kill brick1. The rm -rvf still continues 5. Kill brick2. The rm -rvf fails for the current dirents of '/' with EROFS due to loss of quorum on replicate-0 6. Start force the volume 7. self-heal is triggered. 8. let rm -rvf of step 3 come to completion. 9. ls on mount shows some entries 10. do a second rm -rvf on mount 11. ls now shows no entries. 11. Once heal completes, the directories are still present in bricks of replicate-0. RCA: What is happening is when bricks are brought down during rmdir, AFR sets dirty xattr on the bricks that are up. Later when self-heal happens in step-7, if dirty xattr is set, it triggers a conservative merge on the parent dir, re-creating the entries from brick1 into bricks 2 and 3. This BZ is another manifestation of BZ 1127178 where files re-appear due to conservative merge.
Edited the doc text slightly for the Release Notes.